Differential theory of learning for efficient neural network pattern recognition

نویسندگان

  • J. B. Hampshire
  • Vijaya Kumar
چکیده

We describe a new theory of differential learning by which a broad family of pattern classifiers (including many well-known neural network paradigms) can learn stochastic concepts efficiently. We describe the relationship between a classifier’s ability to generalize well to unseen test examples and the efficiency of the strategy by which it learns. We list a series of proofs that differential learning is efficient in its information and computational resource requirements, whereas traditional probabilistic learning strategies are not. The proofs are illustrated by a simple example that lends itself to closed-form analysis. We conclude with an optical character recognition task for which three different types of differentially generated classifiers generalize significantly better than their probabilistically generated counterparts. 1 DIFFERENTIAL LEARNING A differentiable supervised classifier is one that learns an input-to-output mapping by adjusting a set of internal parameters via an iterative search aimed at optimizing a differentiable objective function (or empirical risk measure). Many well-known neural network paradigms are therefore differentiable supervised classifiers. The objective function is a metric that evaluates how well the classifier’s evolving mapping from feature vector space to classification space reflects the empirical relationship between the input patterns of the training sample and their class membership. Each one of the classifier’s discriminant functions gi(X j ) is a differentiable function of its parameters . We assume that there are C of these functions, corresponding to the C classes ( = f!1 , : : : ,!Cg ) that the feature vector X can represent. These C functions are collectively known as the discriminator (see figure 1). Thus, the discriminator has a C-dimensional output Y , with elements y1 = g1(X j ) , : : : , yC = gC(X j ) . The classifier’s output D X j = (Y) 2 is simply the class label corresponding to the largest discriminator output, as shown in figure 1. Below we described two fundamental strategies for supervised learning: the probabilistic strategy seeks to learn class (or concept) probabilities by optimizing a likelihood function or an error measure objective function; the differential strategy is discriminative and seeks only to identify the most likely class by optimizing a classification figure-of-merit (CFM) objective function (see below). CFM objective functions are best described as differentiable approximations to a counting function: they count the number of correct classifications (or, equivalently, the number of incorrect classifications) the classifier makes on the training sample. The Bayes-optimal classifier is one that always associates X with its most likely class, thereby assuring the minimum probability of making a classification error (e.g., [5, ch 2]). Any classifier that classifies X in this manner is said to yield Bayesian discrimination; equivalently, it is said to exhibit the Bayes error rate (i.e., the minimum probability of misclassification). Since the likelihood of the ith class !i for a particular value of the feature vector X is given by X y1 y2 yC Discriminator g ( | ) X 1 θ g ( | ) X 2 θ g ( | ) X C θ Γ( ) = Y i ω j i ; y = max yj X D( | ) θ Figure 1: A diagrammatic view of the classifier and its associated functional mappings. The classifier input is a feature vector X ; the C discriminator outputs y1, : : : , yC correspond to the classes that X can represent; the class label D X j assigned to the input feature vector corresponds to the discriminator’s largest output. The figure is based on figure 2.3 of Duda & Hart [5]. the a posteriori probability PWjx(!i jX) , one way the classifier will yield Bayesian discrimination is if its discriminant functions equal their corresponding a posteriori class probabilities. We refer to these C a posteriori class probabilities PWjx(!1 jX) , : : : , PWjx(!C jX) as the probabilistic form of the Bayesian discriminant function F (X)Bayes-Probabilistic . Probabilistic learning P is the process by which the classifier’s discriminant functions learn F (X)Bayes-Probabilistic . As the training sample size n grows large, the empirical a posteriori class probabilities converge to their true values; if the classifier’s discriminant functions possess sufficient functional complexity1 to learn F (X)Bayes-Probabilistic precisely, then lim n!1 gi(X j ) = PWjx(!i jX) : P (1) Figure 2 (left) shows F (X)Bayes-Probabilistic for a 3-class random (scalar) feature2 x . A bar-graph below the a posteriori class probabilities of x depicts the class label that the Bayes-optimal classifier assigns to x over its effective domain. Note that the Bayes-optimal class label corresponds to the largest a posteriori class probability for each value of x . The right side of figure 2 depicts an equivalent albeit different form of the Bayesian discriminant function Wjx(!1 jX) , : : : , Wjx(!C jX) , which we call the differential form of the Bayesian discriminant function F (X)Bayes-Differential . It is derived from F (X)Bayes-Probabilistic via the C stochastic linear transformations Wjx(!i jX) = PWjx(!i jX) max k 6=i PWjx(!k jX) , (2) where Wjx(!i jX) denotes the ith a posteriori class differential. Note that the Bayes-optimal class label corresponds to the positive a posteriori class differential for each value of x . Differential learning is the process by which the classifier’s discriminant functions learn F (X)Bayes-Differential . Specifically, as the training sample size n grows large, the empirical a posteriori class differentials converge to their true values; if the classifier’s discriminant functions possess sufficient functional complexity to learn F(X)Bayes-Differential to at least one (sign) bit precision, then 1A formal definition of functional complexity is beyond the scope of this paper. In simple terms, there is a limit to the intricacy of the mapping from feature vector space to classification space implemented by a classifier with limited functional complexity. 2In this case, x is a scalar; we use the notation X and x interchangeably to emphasize that our comments pertain to the general N-dimensional feature vector X. 77 C L A SS L A B E L 23 Bayes -2.7 0 2.7 .5 1.0

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AN IMPROVED CONTROLLED CHAOTIC NEURAL NETWORK FOR PATTERN RECOGNITION

A sigmoid function is necessary for creation a chaotic neural network (CNN). In this paper, a new function for CNN is proposed that it can increase the speed of convergence. In the proposed method, we use a novel signal for controlling chaos. Both the theory analysis and computer simulation results show that the performance of CNN can be improved remarkably by using our method. By means of this...

متن کامل

Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten

Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...

متن کامل

Pattern Recognition in Control Chart Using Neural Network based on a New Statistical Feature

Today for the expedition of the identification and timely correction of process deviations, it is necessary to use advanced techniques to minimize the costs of production of defective products. In this way control charts as one of the important tools for the statistical process control in combination with modern tools such as artificial neural networks have been used. The artificial neural netw...

متن کامل

Numerical solution of fuzzy differential equations under generalized differentiability by fuzzy neural network

In this paper, we interpret a fuzzy differential equation by using the strongly generalized differentiability concept. Utilizing the Generalized characterization Theorem. Then a novel hybrid method based on learning algorithm of fuzzy neural network for the solution of differential equation with fuzzy initial value is presented. Here neural network is considered as a part of large eld called ne...

متن کامل

Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns

The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...

متن کامل

Machine learning based Visual Evoked Potential (VEP) Signals Recognition

Introduction: Visual evoked potentials contain certain diagnostic information which have proved to be of importance in the visual systems functional integrity. Due to substantial decrease of amplitude in extra macular stimulation in commonly used pattern VEPs, differentiating normal and abnormal signals can prove to be quite an obstacle. Due to developments of use of machine l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1965